Augmenting Mathematical Formulae for More Effective Querying & Presentation

نویسنده

  • Moritz Schubotz
چکیده

1 Summary Scientists and engineers search regularly for well‐ established mathematical concepts, expressed by mathematical formulae. Conventional search en‐ gines focus on keyword based text search today. An analogue approach does not work for mathe‐ matical formulae. Knowledge about identifiers alone is not sufficient to derive the semantics of the formula they occur in. Currently, for formula related inquiries the solution is to consult domain experts, which is slow, expensive and non‐deter‐ ministic. Consequently, core concepts to enable formula related queries on potentially large datasets are needed. While earlier attempts addressed the problem as a whole, I identify three mutually or‐ thogonal challenges to formula search. The first challenge, content augmentation, is to collect the full semantic information about indi‐ vidual formula from a given input. Most funda‐ mentally, this might start with digitization of ana‐ logue mathematical content, captures the con‐ version from imperative typesetting instructions (i.e. TEX) to declarative layout descriptions (i.e. presentation MathML) but also deals about infer‐ ring the syntactical structure of a formula (i.e. the expression tree often represented in content MathML). In addition, this first challenge involves the association of formula metadata such as con‐ straints, identifier definitions, related keywords or substitutions with individual formulae. The second challenge is content querying. This ranges from query formulation, to query pro‐ cessing, actual search, hit ranking to result presentation. There are different forms of for‐ mula queries. Standard ad‐hoc retrieval queries, where a user defines the information need and the math information retrieval system returns a ranked list given a particular data set. Similar is the interactive formula filter queries, where a user filters a data set interactively until she de‐ rives at the result set, which is relevant to her needs. Different are unattended queries that run in the background to assist authors during editing or readers to identify related work while viewing a certain formula. The third challenge is content indexing for grow‐ ing data sets. This challenge includes the scalable execution of the solutions to the two aforemen‐ tioned challenges. While well‐established from the area of database systems i.e. XML processing and indexing can be applied, math specific com‐ plexity problems require individual solutions. Augmented content (challenge 1) opens up addi‐ tional options for similarity search, and poten‐ tially improves the search results regardless of the applied similarity measure. In order to sepa‐ rate the effect of content augmentation from in‐ trinsic improvements …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Search Engine for Mathematical Formulae

We present a search engine for mathematical formulae. The MathWebSearch system harvests the web for content representations (currentlyMathML andOpenMath) of formulae and indexes them with substitution tree indexing, a technique originally developed for accessing intermediate results in automated theorem provers. For querying, we present a generic language extension approach that allows construc...

متن کامل

A New High Order Closed Newton-Cotes Trigonometrically-fitted Formulae for the Numerical Solution of the Schrodinger Equation

In this paper, we investigate the connection between closed Newton-Cotes formulae, trigonometrically-fitted methods, symplectic integrators and efficient integration of the Schr¨odinger equation. The study of multistep symplectic integrators is very poor although in the last decades several one step symplectic integrators have been produced based on symplectic geometry (see the relevant lit...

متن کامل

MathWebSearch 0.5: Scaling an Open Formula Search Engine

MathWebSearch is an open-source, open-format, contentoriented search engine for mathematical formulae. It is a complete system capable of crawling, indexing, and querying expressions based on their functional structure (operator tree) rather than their presentation. In version 0.5, we concentrate on scalability issues in MathWebSearch to take advantage of corpora in the giga-formula range. We r...

متن کامل

Cost-Effective Combination of Multiple Rankers: Learning When Not To Query

Combining multiple rankers has potential for improving the performance over using any of the single rankers. However, querying multiple rankers for every search request can often be too costly due to efficiency or commercial reasons. In this work, we propose a more cost-effective approach that predicts the utility of any additional rankers, prior to querying them. We develop a combined measure ...

متن کامل

Indexing and Searching Mathematics in Digital Libraries

This paper surveys approaches and systems for searching mathematical formulae in mathematical corpora and on the web. The design and architecture of our MIaS (Math Indexer and Searcher) system is presented, and our design decisions are discussed in detail. An approach based on Presentation MathML using a similarity of math subformulae is suggested and verified by implementing it as a math-aware...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016